171 research outputs found

    Identity Testing Under Label Mismatch

    Get PDF
    Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model

    Testing Data Binnings

    Get PDF

    Invariance principle on the slice

    Get PDF
    We prove an invariance principle for functions on a slice of the Boolean cube, which is the set of all vectors {0,1}^n with Hamming weight k. Our invariance principle shows that a low-degree, low-influence function has similar distributions on the slice, on the entire Boolean cube, and on Gaussian space. Our proof relies on a combination of ideas from analysis and probability, algebra and combinatorics. Our result imply a version of majority is stablest for functions on the slice, a version of Bourgain's tail bound, and a version of the Kindler-Safra theorem. As a corollary of the Kindler-Safra theorem, we prove a stability result of Wilson's theorem for t-intersecting families of sets, improving on a result of Friedgut.Comment: 36 page

    Approximate resilience, monotonicity, and the complexity of agnostic learning

    Full text link
    A function ff is dd-resilient if all its Fourier coefficients of degree at most dd are zero, i.e., ff is uncorrelated with all low-degree parities. We study the notion of approximate\mathit{approximate} resilience\mathit{resilience} of Boolean functions, where we say that ff is α\alpha-approximately dd-resilient if ff is α\alpha-close to a [−1,1][-1,1]-valued dd-resilient function in ℓ1\ell_1 distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class CC over the uniform distribution. Roughly speaking, if all functions in a class CC are far from being dd-resilient then CC can be learned agnostically in time nO(d)n^{O(d)} and conversely, if CC contains a function close to being dd-resilient then agnostic learning of CC in the statistical query (SQ) framework of Kearns has complexity of at least nΩ(d)n^{\Omega(d)}. This characterization is based on the duality between ℓ1\ell_1 approximation by degree-dd polynomials and approximate dd-resilience that we establish. In particular, it implies that ℓ1\ell_1 approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α\alpha-approximately Ω~(αn)\widetilde{\Omega}(\alpha\sqrt{n})-resilient monotone functions for all α>0\alpha>0. Prior to our work, it was conceivable even that every monotone function is Ω(1)\Omega(1)-far from any 11-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes{\sf Tribes} and CycleRun{\sf CycleRun} that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

    Once in a summer: Fall history of the JaH 073 strewn field, Sultanate of Oman

    Get PDF
    Modeling of a prehistoric fall can be successful if a strewn field is very well documented and coordinates, masses, and shapes of all individual stones are recorded. In combination with meteoroid mass and wind model constraints, a detailed scenario of the atmospheric passage is obtained for the ~20 9 6 km-sized JaH 073 L6 strewn field in Oman. The wide mass ranges from 52.2 kg to <1 g together with the large number of ~3500 stones offer the statistical basis to reconstruct the trajectory and the fragmentation sequence. The size of the meteoroid, constrained by noble gas analyses, corresponds to an initial mass of about 12 t at atmospheric entry using an L-chondrite bulk density of 3400– 3500 kg m-3. Assuming typical ablation behavior, these data are compatible with an entry velocity of 20±3 km s-1. The best model fit is achieved for a serial fragmentation scenario starting at an altitude of ~34 km and showing a main fragmentation at 26 km. A resolved event seems to have occurred at 22 km, followed by a more diffuse fragmentation at 19 km. The vertical trajectory angle is calculated at 43 ± 2° and the azimuth at 329 ± 1°. The position of numerous outlying meteorites in the strewn field can only be reproduced by repeated fragmentation with cumulated transverse velocities from explosive events. The wind model adopted from modern data fits surprisingly well and indicates summer monsoon with strong easterly winds during the fall event, consistent with paleoclimatic data

    Testing k-Monotonicity

    Get PDF
    A Boolean k-monotone function defined over a finite poset domain D alternates between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-monotone functions are natural generalizations of the classical monotone functions, which are the 1-monotone functions. Motivated by the recent interest in k-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of k-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are k-monotone (or are close to being k-monotone) from functions that are far from being k-monotone. Our results include the following: 1. We demonstrate a separation between testing k-monotonicity and testing monotonicity, on the hypercube domain {0,1}^d, for k >= 3; 2. We demonstrate a separation between testing and learning on {0,1}^d, for k=omega(log d): testing k-monotonicity can be performed with 2^{O(sqrt d . log d . log{1/eps})} queries, while learning k-monotone functions requires 2^{Omega(k . sqrt d .{1/eps})} queries (Blais et al. (RANDOM 2015)). 3. We present a tolerant test for functions fcolon[n]^dto {0,1}$with complexity independent of n, which makes progress on a problem left open by Berman et al. (STOC 2014). Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques. Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques

    Similarities in social calls during autumn swarming may facilitate interspecific communication between Myotis bat species

    Get PDF
    Bats employ a variety of social calls for communication purposes. However, for most species, social calls are far less studied than echolocation calls and their specific function often remains unclear. We investigated the function of in-flight social calls during autumn swarming in front of a large hibernaculum in Northern Germany, whose main inhabitants are two species of Myotis bats, Natterer’s bats (Myotis nattereri) and Daubenton’s bats (Myotis daubentonii). We recorded social calls in nights of high swarming activity and grouped the calls based on their spectro-temporal structure into ten types and verified our visual classification by a discriminant function analysis. Whenever possible, we subsequently assigned social calls to either M. daubentonii or M. nattereri by analyzing the echolocation calls surrounding them. As many bats echolocate at the same time during swarming, we did not analyze single echolocation calls but the “soundscape” surrounding each social call instead, encompassing not only spectral parameters but also the timbre (vocal “color”) of echolocation calls. Both species employ comparatively similar social call types in a swarming context, even though there are subtle differences in call parameters between species. To additionally gain information about the general function of social calls produced in a swarming context, we performed playback experiments with free-flying bats in the vicinity of the roost, using three different call types from both species, respectively. In three out of six treatments, bat activity (approximated as echolocation call rate) increased during and after stimulus presentation, indicating that bats inspected or approached the playback site. Using a camera trap, we were sometimes able to identify the species of approaching bats. Based on the photos taken during playbacks, we assume one call type to support interspecific communication while another call type works for intraspecific group cohesion

    The soundscape of swarming: Proof of concept for a noninvasive acoustic species identification of swarming Myotis bats

    Get PDF
    Bats emit echolocation calls to orientate in their predominantly dark environment. Recording of species‐specific calls can facilitate species identification, especially when mist netting is not feasible. However, some taxa, such as Myotis bats can be hard to distinguish acoustically. In crowded situations where calls of many individuals overlap, the subtle differences between species are additionally attenuated. Here, we sought to noninvasively study the phenology of Myotis bats during autumn swarming at a prominent hibernaculum. To do so, we recorded sequences of overlapping echolocation calls (N = 564) during nights of high swarming activity and extracted spectral parameters (peak frequency, start frequency, spectral centroid) and linear frequency cepstral coefficients (LFCCs), which additionally encompass the timbre (vocal “color”) of calls. We used this parameter combination in a stepwise discriminant function analysis (DFA) to classify the call sequences to species level. A set of previously identified call sequences of single flying Myotis daubentonii and Myotis nattereri, the most common species at our study site, functioned as a training set for the DFA. 90.2% of the call sequences could be assigned to either M. daubentonii or M. nattereri, indicating the predominantly swarming species at the time of recording. We verified our results by correctly classifying the second set of previously identified call sequences with an accuracy of 100%. In addition, our acoustic species classification corresponds well to the existing knowledge on swarming phenology at the hibernaculum. Moreover, we successfully classified call sequences from a different hibernaculum to species level and verified our classification results by capturing swarming bats while we recorded them. Our findings provide a proof of concept for a new noninvasive acoustic monitoring technique that analyses “swarming soundscapes” by combining classical acoustic parameters and LFCCs, instead of analyzing single calls. Our approach for species identification is especially beneficial in situations with multiple calling individuals, such as autumn swarming

    Morphological and Transcriptomic Analysis of a Beetle Chemosensory System Reveals a Gnathal Olfactory Center

    Get PDF
    OR gene tissue expression and their chromosomal localization. a Venn diagram showing the number of ORs expressed (RPKM ≄ 0.5) in the different body parts: antennae, legs, mouthparts (as piece of the head capsule anterior of the antennae), heads (the whole head capsule including mouthparts but excluding the antennae), and bodies (excluding head and legs). b Venn diagram comparing our results (yellow, green) with data from Engsontia et al. [115] (blue, red). Number of expressed ORs, defined by RPKM ≄ 0.5 (yellow), by RT-PCR (blue), not expressed RPKM < 0.5 (green), or with no RT-PCR amplicon (red). ORs of the brown group were not previously tested by Engsontia et al. c Chromosomal localization of T. castaneum ORs. Based on the Georgia GA-2 strain genome assembly 3.0 [81], only chromosomal linkage groups containing an IR or SNMP are depicted. Gene clusters are indicated by a number referring to the chromosome and a letter conveys the relative position on the chromosome. The number of genes within this cluster is indicated in the square brackets. (PDF 277 kb
    • 

    corecore